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Abstract 

The article describes a reconstruction pipeline that generates piecewise-planar models of man-made environments using two cali- 
brated views. The 3D space is sampled by a set of pre-defined virtual cut planes that pass in between cameras and implicitly define 
possible pixel correspondences across views. The likelihood of these correspondences being true matches is measured using signal 
symmetry analysis [1], which enables to obtain profile contours of the 3D scene that become lines whenever the virtual cut planes 
intersect planar surfaces. The detection and estimation of these lines cuts is formulated as a global optimization problem over 
the symmetry matching cost, and pairs of reconstructed lines are used to generate plane hypotheses that serve as input to PEARL 
clustering [2]. The PEARL algorithm alternates between a discrete optimization step, which merges planar surface hypotheses 
and discards detections with poor support, and a continuous optimization step, which refines the plane poses taking into account 
surface slant. The pipeline outputs an accurate semi-dense PPR of the 3D scene. In addition, the input images can be segmented 
into piecewise-planar regions using a standard MRF formulation for assigning pixels to plane detections. Extensive experiments 
with both indoor and outdoor stereo pairs show significant improvements over state-of-the-art methods with respect to accuracy and 
robustness. 
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1. Introduction 

Stereo cameras are becoming increasingly popular because 
of the recent advent of 3D visualization and display. A few 
years ago they were considered special purpose devices that 
could only be found in research labs and high-end equipments, 
but nowadays they are a consumer electronics product being 
available either as standalone hand-held cameras (e.g. Fujifilm 
Finepix 3D, Sony Bloggie, etc), or integrated into smart-phones 
(e.g. HTC Evo 3D). Our work is motivated by this proliferation 
of stereo cameras that we believe will create an urge for robust 
algorithms able to render complete, photo-realistic 3D models 
in an automatic manner. 

Stereo reconstruction is a classical problem in computer and 
robot vision that deserved the attention of thousands of authors 
[3, 4]. Despite of the many advances in the field, situations of 
poor texture, variable illumination, severe surface slant or oc- 
clusion are still challenging for most stereo matching methods, 
making it difficult to find a tuning that provides good results un- 
der a broad variety of acquisition circumstances [5]. Since man- 
made environments are dominated by planar surfaces, several 
authors suggested to overcome the above mentioned difficul- 
ties by using the planarity assumption as a prior for the stereo 
reconstruction [6, 7, 8, 9, 10]. These approaches have the ad- 
vantage of providing piecewise-planar 3D models of the scene 
that are perceptually pleasing and geometrically simple, and, 
thus, their rendering, storage and transmission is computation- 
ally less complex. This article proposes a pipeline for two-view 
Piecewise-Planar Reconstruction (PPR) understood as the de- 
tection and reconstruction of dominant planar surfaces in the 



scene. . 

PPR is in a large extent a chicken- and- egg problem. If there 
is accurate 3D evidence about the scene, such as points, lines, 
vanishing directions, etc, then the problem of detecting, seg- 
menting, and estimating the pose of dominant planes can be po- 
tentially solved using standard model fitting techniques [13, 2]. 
On the other hand, if there is a prior knowledge about the dom- 
inant planes in the scene, then the matching process can be 
constrained to improve the accuracy of the final 3D reconstruc- 
tion, e.g. the known plane orientations can be used to guide the 
stereo aggregation, as done in [11]. Existing methods for PPR 
typically comprise three steps that are executed sequentially: 

1. 3D Reconstruction: The objective is to collect 3D evi- 
dence about the scene from multiple views. This evidence 
can either be obtained from sparse stereo that matches a 
sparse set of features across views (e.g. [8, 9]), or from 
dense stereo that performs dense data association between 
frames by assigning to each image pixel a disparity value 
(e.g. [10]). 

2. Plane Hypotheses Generation: Given the 3D data, the ob- 
jective is to detect and estimate the pose of planar surfaces 
using some sort of multi-model fitting approach. 

3. Plane Labeling: The goal is to assign to each image 
pixel one of the plane hypotheses generated in the previous 
step. This is usually done using a Markov Random Field 



l We mean by PPR something that is different from approximating surfaces 
by small planes, as typically done in several dense stereo methods (e.g. [11,12]) 
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(MRF) framework with photo-consistency being used as 
data term. 

While most methods were originally designed to receive mul- 
tiple views [6, 14, 7, 8, 9, 10], we propose a pipeline that uses 
only two views and makes no assumptions about the scene other 
than the fact of being dominated by planar surfaces. The nov- 
elty is mainly in the steps of 3D Reconstruction and Plane Hy- 
pothesis Generation, and the contributions can be summarized 
as follows: 

• Reconstruction of line cuts using Stereo from Induced 
Symmetry (SymStereo): Establishing dense stereo corre- 
spondence is computationally expensive specially when 
dealing with high-resolution images. On the other hand, 
sparse stereo applied to only two views tends to provide in- 
sufficient 3D data for establishing accurate plane hypothe- 
ses. Thus, we propose to carry a semi-dense reconstruction 
of the scene by independently recovering depth along a set 
of pre-defined virtual planes using SymStereo [1]. This 
approach is known as Stereo-Rangefinding, because the re- 
sult are profile cuts of the scene similar to the ones that 
would be obtained using a Laser-Rangefinder [15]. Since 
the intersection of the virtual scan planes with planes con- 
tained in the scene are lines, we extract line segments from 
the profile cuts and use these line cuts to generate plane 
hypotheses [16]. 

• Improving SymStereo accuracy in the case of surface 
slant: In a similar manner to what happens in conventional 
stereo, surface slant affects the depth estimation obtained 
from SymStereo. In this case, the line cuts are poorly re- 
constructed, which also implies that the detected planes 
are inaccurate. We study the problem of surface slant in 
the context of the SymStereo framework, and devise a 
simple solution that enables an accurate reconstruction of 
highly slanted planes. 

• Global plane fitting using PEARL [2]: Most methods for 
PPR treat stereo matching and plane detection in a sequen- 
tial and independent manner [6, 14, 7, 8, 9, 10]. This is 
problematic because the accuracy of the plane hypotheses 
is inevitably limited by the accuracy of the initial 3D re- 
construction that does not take into account the fact of the 
scene being dominated by planar surfaces. We carry the 
3D reconstruction and the plane fitting in a simultaneous 
and integrated manner using the recent PEARL framework 
proposed in [2] . The algorithm alternates between a global 
discrete optimization step, which merges plane surface hy- 
potheses and discards spurious detections, and a continu- 
ous optimization step over the symmetry energy, which re- 
fines the plane pose estimation taking into account surface 
slant. The output is a set of plane hypotheses and a semi- 
dense PPR of the 3D scene, where the reconstructed line 
cuts are labeled according to the plane detections. 

1.1. Related Work 

Several works in PPR start by obtaining a sparse 3D recon- 
struction of the scene (e.g. point clouds, edge lines, etc), then 



establish plane hypotheses by applying multi-model fitting to 
the reconstructed data, and finally use these hypotheses to guide 
the dense stereo process and/or perform a piecewise-planar seg- 
mentation of the input images [6, 14, 7]. Werner and Zisser- 
man use multiple cues and assumptions to find dominant sur- 
face orientations, and then perform plane- sweep reconstruction 
along the detected normal directions. Pollefeys et al [7] detect 
planar surfaces in urban environments from 3D point features 
obtained from SfM, and use the estimated normals for guiding 
plane- sweep stereo. 

Furukawa et al [8] propose to perform PPR assuming a 
Manhattan- world model. They reconstruct 3D patches in tex- 
tured image regions from multiple views using [17], and use the 
normals of these patches to establish plane hypotheses. These 
hypotheses are then used in a MRF formulation for pixel- wise 
plane labeling. In [9], Sinha et al. introduce a probabilistic 
framework for assigning plane hypotheses to pixels with the 
evidences of planar surfaces being provided by point cloud re- 
construction, estimation of vanishing lines, and sparse recon- 
struction of edges. Gallup et al [10] propose a stereo method 
capable of handling both planar and non-planar objects con- 
tained in the scene. A robust procedure based on RANSAC is 
used for fitting plane hypotheses to dense depth maps, followed 
by a MRF formulation for plane labeling of the input images. 

An alternative strategy is to over-segment the stereo images 
based on color information and fit a 3D plane to each non- 
overlapping region. The number of planes to be considered 
is defined by the segmentation result, which acts as a smooth- 
ness prior during the global optimization. This segmentation 
information is either used as a hard minimization constraint 
[18, 19, 20] or as a soft constraint [21]. The main weakness 
of this type of strategy is the assumption that planar surfaces 
in the scene have different colors, which is often not the case 
in most man-made environments (e.g. walls, doors, windows, 
etc). 

The drawback of the approaches described so far is the fact 
that depth estimation and plane fitting are carried in a sequential 
and decoupled manner. The errors in the extracted 3D evidence 
may affect the accuracy of the plane pose estimation, and the in- 
ferred planar surfaces are not used for refining the initial depth 
estimates. 

There are a few approaches [22, 23, 24] that perform PPR by 
carrying stereo matching and 3D plane fitting iteratively. The 
strategy consists in alternating between segmenting the input 
images into non-overlapping regions and estimating the plane 
parameters for each region. However, and as stated by the 
authors of [23], these types of algorithms can become easily 
stuck in a local minimum whenever they face challenging sur- 
face structures e.g. surfaces with low and/or repetitive texture. 

1.2. Article Overview and Notation 

Section 2 reviews three background concepts that are used 
throughout the article. Section 3 proposes an algorithms for 
reconstructing line cuts along a single virtual cut plane, and 
Section 4 shows how the SymStereo estimates can be refined in 
case there is prior slant information available. Then, Section 5 
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present an algorithm for semi-dense PPR that combines Sym- 
Stereo and PEARL. Finally, Section 6 reports experiments in 
PPR, where the accuracy of the plane estimation and pixel la- 
beling is evaluated with respect to ground truth data, and the 
performance of our pipeline is compared with two different 
strategies. The results show that the plane hypotheses com- 
puted using our symmetry -based algorithm outperforms the ap- 
proaches based on dense stereo reconstruction and sparse fea- 
ture matching. 

We represent scalars in italic, e.g. s, vectors in bold char- 
acters, e.g. p, matrices in sans serif font, e.g. M, and image 
signals in typewriter font, e.g. I. Unless stated otherwise, we 
use homogeneous coordinates for points and other geometric 
entities, e.g. a point with non-homogeneous image coordinates 
(pi? P2) is represented by p ~ (P1P2 1) T , with ~ denoting 
equality up to scale. 

2. Background 

This section briefly reviews three background concepts that 
are used throughout the article, namely Stereo-Rangefinding us- 
ing SymStereo (Section 2.1), the energy-based multi-model fit- 
ting framework called PEARL (Section 2.2), and a global pixel- 
wise plane labeling formulation (Section 2.3). There is no ma- 
jor novelty, so that readers that are familiar with these concepts 
can skip this section. 

2.1. Stereo-Rangefinding using SymStereo 

The SymStereo framework [1] was proposed for matching 
pixels across stereo views using symmetry analysis instead of 
traditional photo-consistency. Let I and I 7 be a pair of recti- 
fied stereo images and consider a virtual cut plane II (see Fig- 
ure 1). The orientation of the virtual plane is arbitrary being 
the only requirement that it intersects the baseline. Under such 
circumstances, the left and right back-projections become re- 
flected one with respect to the other at the locations where the 
virtual plane intersects the scene. Thus, the sum of both back- 
projections gives rise to an image signal that is locally sym- 
metric around the profile cut, while the subtraction results in 
a signal that is anti- symmetric. These symmetries are usually 
not strict symmetries due to perspective distortion, surface slant 
and occlusions, but can be used as cues to recover the profile 
cut where the virtual plane meets the scene. 

Assuming that the world coordinate system is coincident 
with the reference frame of the left view, the virtual cut plane 
II can be represented by the homogeneous vector 

II~(n -h) T , (1) 
where n indicates the direction orthogonal to the plane 

n ~ (m n 2 n 3 ) T . 

The homogenous coordinates of the intersection point O of the 
virtual cut plane with the baseline is given by [1]: 

o~(£ 0 0 i) T . 



Using j3 to denote the ratio between 0\ and the baseline length 
b comes that the plane II passes between the cameras iff the 
following condition holds 

0 < p < 1. 

For efficiency purposes, the images do not need to be explicitly 
back-projected onto the virtual plane II, but instead the homog- 
raphy H induced by II can be used to map points from the right 
view into the left view [1]: 

(\ _|_ bn i frn 2 bn 3 \ 

/ h—bni h—bn\ h—bn\ \ 

H ~ 0 1 0 (2) 



Assuming that I is the warping result of mapping I' using H, 
then it comes from the mirroring effect [1] that I and I are re- 
flected around the image of the profile cut (see Figure 1). Thus, 
the sum of I and I yields an image signal I s that is symmet- 
ric around the locus where the profile cut is projected. In a 
similar manner, the difference between I and I gives rise to an 
image signal I A that is anti- symmetric at the exact same loca- 
tion. SymStereo detects the image of the profile cut by jointly 
evaluating the symmetry and anti- symmetry in I s and I A . This 
provides an implicit manner of recovering depth along II and 
achieving data association across views. 

Since the symmetries are induced using virtual cut planes, 
SymStereo is particularly well suited for recovering depth along 
pre-defined scan planes. As discussed in [1], this is an effec- 
tive way of probing into the 3D structure, resulting in profile 
cuts that resemble the ones obtained with a Laser-Rangefinder 
[15]. The independent estimation of depth along a scan plane 
is called Stereo-Rangefinding. It was concluded in [1] that the 
logTV matching cost is the top-performer metric for this pur- 
pose. The logN cost is based on local frequency analysis for 
locating symmetric structures by employing a bank of N log- 
Gabor wavelets (we set N = 10 in this article). The output of 
log 10 is the joint energy E, where the image of the profile cut is 
highlighted (see Figure 1). 

2.2. Energy-based multi-model fitting using PEARL 

Isack and Boykov argued in [2] that methods that greedily 
search for models with most inliers while ignoring the over- 
all classification of data are a flawed approach to multi-model 
fitting, and that formulating the fitting as an optimal labeling 
problem with a global energy function is preferable. For this 
purpose, they propose the PEARL algorithm consisting in three 
main steps: 

1. Propose an initial set of plausible models (labels) £ 0 from 
the observations 

2. Expand the label set for estimating its spatial support (in- 
lier classification) 

3. Re-estimate the inlier models by minimizing some error 
function. 
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Figure 1: The virtual cut plane II (yellow) passes between the cameras and intersects the 3D scene in a non-continuous 3D curve (magenta). Let I be the result 
of warping l' by the homography induced by II. The images I s and I A are, respectively, symmetric and anti- symmetric around the image of the profile cut 
(magenta). The output of the log N joint symmetry and anti-symmetry quantification method is the energy map E that highlights the image of the profile cut. 



Given the initial model set Co, the multi-model fitting is cast as 
a global optimization where each model in Co is interpreted as 
a particular label /. Consider that d G V is a data point and that 
fd is a particular label in Co assigned to d. The objective is to 
compute the labeling f = {fd\d G V} such that the following 
energy is minimized: 
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where M is the neighborhood system considered for d, Dd(fd) 
is some error that measures the likelihood of point d belong- 
ing to model fd, and V d)C is the spatial smoothness term that 
encourages piecewise smooth labeling by penalizing configura- 
tions f that assign to neighboring nodes d and e different labels. 
The label term is used for describing the data points using as 
few unique models as possible, with Tf being the subset of dif- 
ferent models assigned to the nodes d by the labeling f (see [2] 
for further details). In order to handle outlier data points in V, 
the outlier label f$ is added to Co. Any point d to which is as- 
signed the label f$ is considered an outlier, and has usually a 
constant likelihood measure Dd(fd = h) — T - The energy of 
Equation 3 is efficiently minimized using a-expansion [2] . 

Finally, the third step of PEARL consists in re-estimating the 
model labels / in Co with non-empty set of inliers D(/) = 
{d G V\fd = /}. Let mf be the model associated to the label 
/. Each model mj is refined by minimizing the error cost over 
its parameters: 

m* f = min V D d (f). 
den(f) 

The models with non-empty set in Co are replaced with the re- 
fined models m^, and the labels with empty set are discarded. 
The new set of labels C\ is then used in a new expand step, and 
we iterate between discrete labeling and plane refinement until 
the a-expansion optimization does not decrease the energy of 
Equation 3. 

2.3. MRF for Plane Labeling 

Given a set of plane hypotheses in the scene, the following 
step of most PPR algorithms is to compute a pixel-wise plane 
labeling of the input images. We follow a standard MRF formu- 
lation for comparing all the tested algorithms. The objective is 



to minimize an energy involving data, smoothness and labeling 
terms (refer to Equation 3). In this case, the nodes d G V are 
the image pixels, and the labels / G V are the plane hypothe- 
ses. A 4x4 neighborhood A/4 is assumed for neighboring pixels 
d and e, and the data term is defined as 



min(p d (/), 

^7 Praax 



c) 



iffeV 
if/ = /0 



(4) 



where p d (f) is the photo-consistency between the pixels in 
the two views put into correspondence by the plane associ- 
ated to label /. For measuring the photo-consistency, we use 
Zero-mean Normalized Cross-correlation (ZNCC). The photo- 
consistency metric is given by Pd(f) = (1 — ZNCC(f))/2, 
where ZNCC(f) is the cost obtained using ZNCC for the 
plane hypothesis / (p m ax and 7 are constant parameters). 
The smoothness term is defined as: 



V d , e (fd, fe) 



0 if fd = fe 
M if(/ d V/ e ) 

D' otherwise 



(5) 



where 



D' min(L>, M) + m and 
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D is the 3D distance between neighboring points according to 
their plane f d and f e , respectively, M and m are constant pa- 
rameters, and 

A7 = |7(d)- 7(e) I . 

is the image gradient. 

The data and the smoothness terms are, with minor differ- 
ences, similar to the ones used in the graph-cut labeling of 
Gallup et al. [10]. We additionally add the labeling term 
AlI^/I for avoiding very close plane hypotheses in V to be 
assigned in f . This has the effect of simplifying the 3D model 
using as few unique planes as possible. 

3. Reconstruction of Lines along a single cut plane 

The reconstruction of lines from two or more views has in 
the vast majority of existing algorithms one common denomi- 
nator: the detection of line segments in the input views that are 
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Figure 2: Reconstruction of 3D line cuts from a stereo pair along a virtual cut plane II. We use SymStereo and employ the log 10 symmetry-based matching cost 
for obtaining the joint energy E. The energy E is used as input to a weighted Hough transform for extracting line cuts (black lines), from which the most appropriate 
hypotheses (in this example only one line cut (blue) is detected) are selected using a global framework constituted by data, smoothness and label costs. 



matched in subsequent steps. In the case there are no (salient) 
line segments in the input images, then no 3D line reconstruc- 
tions can be obtained. 

This section describes an algorithm that reconstructs a set of 
3D line cuts along a single virtual cut plane II using Stereo- 
Rangefinding. This is achieved by noting that the intersection 
of II with a plane in the scene is a line (e.g. in Figure 2, the 
intersection of II with the floor plane is the blue line cut). The 
3D lines corresponding to the intersection of II with multiple 
planes are projected onto the stereo views as line segments, 
whose locations in most of the cases cannot be perceived only 
from the input images alone (there are no visible edges). How- 
ever, these lines can be reliably detected and estimated from 
the joint symmetry and anti- symmetry energy E that is obtained 
from II. Remark that each line cut that is detected from a vir- 
tual cut plane corresponds to a particular plane contained in the 
scene. However, the corresponding parameters cannot be esti- 
mated from a single cut plane (we will see in Section 5 how 
to detect and estimate planes based on the information of more 
than one virtual cut plane). 

3.1. Line cut detection using Hough and PEARL 

As shown in Figure 2, we use the SymStereo framework 
along a virtual cut plane II and employ the log 10 symmetry 
metric for computing the joint energy E. Each pixel in E pro- 
vides the matching likelihood of a particular pair of pixels in 
the stereo views, being an indirect measurement of the occu- 
pancy probability in 3D along II. The energy E is used as input 
to a weighted Hough transform for extracting a set of line cut 
hypotheses jCq. This is accomplished by selecting the Nh local 
maxima in the Hough voting space. 

Next, we formulate the line cut detection as a global labeling 
problem in a PEARL framework, in which the objective is to 
assign to each epipolar line (image row) a line cut hypothesis in 
Co. Following the notation of Section 2.2, the data points d of 
the graph are the epipolar lines, with the size of the set V being 
equal to the number of image rows, and the goal is to assign a 
line segment label / to each epipolar line d. The data term is 
defined as 

D (f) = i min ^ ~ Xf ^ T " j if f^f® 
d ^ J } \ a$r otherwise 

where E(r, c) is the joint energy value for row r and column 
c. The coordinate x f corresponds to the intersection between 
the epipolar line d and the line segment 1 / associated to label 
/. Remark that the truncation parameter r is used for handling 
poorly matching surfaces e.g. containing low and/or repetitive 



textures, while the discard label f$ indicates that no satisfactory 
line cut hypothesis can be assigned to d. In this case, the virtual 
cut plane II has high probability of not intersecting a planar 
surface along the epipolar plane associated to d. 

The smoothness term of neighboring nodes d and e is given 

by 

( 0 if f d = f e 

V de (f d J e )=l A 0 if(/dV/ c ) = / 0 

I A/2+1 otherwise 

where 

AI=\I(d,x fd )-I(e,x fe )\ 

is the image gray-scale gradient. No penalization is assigned to 
neighboring image rows d and e receiving the same label, while 
in the case one node receives the label then a non-zero cost 
A0 is added to f . The smoothness term V prefers label transi- 
tions at locations of larger image gradient (lower smoothness 
cost), which usually occurs at the boundaries of two different 
surfaces. We use a constant label term in Equation 3 for 
favoring line cut assignments f with fewer labels. 

Finally, and after computing an initial labeling solution f for 
nodes d, the line cuts 1 are refined by minimizing their parame- 
ters over the energies E via Levenberg-Marquardt (LM) [25] 

\}=mm (1-E(d,x/)), (6) 

f der>(f) 

where D(/) is a subset of image rows d to which the label / 
was assigned. Remark that at each solver iteration, the point x f 
on d is recomputed according to the current line cut hypothesis 
1 f . The new set of line cuts 1 J are then used in a new global line 
cut assignment (expand) step, and we iterate between discrete 
labeling and line cut refinement until the energy of Equation 3 
stops decreasing (which usually occurs after 2 — 3 iterations). 

3.2. Experiments in line cut detection 

We performed experiments of our line cut detection ap- 
proach 2 on various indoor scenes (see Figure 2-4) acquired us- 
ing a Bumblebee stereo camera from PointGrey, which has a 
baseline of 24 cm and image resolution of 1024 x 768 pixels. 

In the first example of Figure 3, we detect 2 different line 
cuts for each virtual cut plane, one corresponds to the inter- 
section of the cut planes with the floor and one is due to the 



2 We used for all the experiments the same parameters: Nh = 200, Xs = 1, 
r = 0.8, CK0 = 0.7, A0 = 0.9 and Xl = 20, which were empirically selected 
without much effort. 
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(a) (b) (c) 



Figure 3: Results produced by our line cut detection algorithm along 5 virtual cut planes. We show for each example the left and right views with the detected line 
cuts overlaid (compare the matching between the views), different colors indicate different cut planes, while different shades identifying different line cuts. 




(a) a 0 =O.7and X L = 20 (b) a 0 =0.8 and X L = 10 



Figure 4: Results for two different settings of a$ and Ax,. By varying these 
parameters, we can control the algorithm to be more permissive with respect 
to what is considered a line cut (b), while for lower values of a$ and higher 
values of Ax, the algorithm only detects line segments with high probability of 
belonging to planar surfaces (a). 

intersection with the wall. Remark that the matching of the 
line segments across the views is almost perfect and consis- 
tent for all virtual cut planes, even tough the line cut detec- 
tion was carried for each virtual cut plane independently. In 
the example (b), the scene consists of multiple planar surfaces, 
some containing quite complicated textures. In this case, the 
line cut estimation approach along a single virtual cut plane be- 
gins to have difficulties. In situations the cut plane intersects 
the scene in low-textured regions, the symmetry based match- 
ing using log 10 does not provide a well defined ridge at the 
locations of the image of the profile cut. Following this, the 
algorithm prefers to label those regions with the f$ label (e.g. 
blue cut plane), since it has low confidence about the location 
of the image of the profile cut. Finally, the example (c) presents 
some failure cases of this approach, namely slanted surfaces 
with low-texture. In this cases, the algorithm tends to (i) assign 
more than one line cut label that corresponds to the same planar 
surface (noisy energy E), (ii) does not detect the line cut at all, 
or (iii) computes wrong matches. Note that (i) could be handled 
by increasing the label cost Al, however this would imply that 
line cuts corresponding to close planes (e.g. chair backs and 
wall in example (c)) are assigned the same label. We will show 
in Section 4 that these estimations can be considerably refined, 
and in Section 5 that most of the difficulties are easily handled 
by our PPR algorithm that jointly estimates plane hypotheses 
from multiple virtual cut planes simultaneously. 

So far most of the examples contained only planar surfaces. 
We show in Figure 4 a scene containing a non-planar object 
above the floor plane. The control of labeling just strict planes 
(example (a)) or approximate non-planar surfaces by an appro- 
priate set of planes (example (b)) is achieved using different 
settings of the weighting factor a$ and the label cost A l • Us- 



ing low values of a$ and high values of Xl implies that only 
line cuts belonging to planar surfaces are reconstructed, while 
higher values of a$ and low values of Al enable to approximate 
non-planar surfaces by various plausible line cuts. 

4. Line cut detection under surface slant 




p q p * p'q' 



Figure 5: Geometric overview of SymStereo. The camera centers C and C x 
are separated by a distance b (the baseline). The 3D point Q on d^ r is de- 
tected using the mirroring effect induced by the virtual cut plane 11^ (yellow) 
intersecting the baseline. 

The SymStereo framework was introduced in [1], where a 
thorough geometric analysis was provided. This section ex- 
tends [1], and studies the problem of surface slant in the con- 
text of SymStereo. In plane- sweeping [26] it is possible to 
integrate prior knowledge of the scene to select the sweeping 
directions that maximize the performance of photo-consistency 
based stereo (e.g. [27, 6]). We show that slant priors can also be 
used in SymStereo for choosing the cut planes that render per- 
fect signal symmetries, and improve the overall accuracy and 
robustness of the approach. 

Consider a generic point P and a point Q that lies on the 
same epipolar plane \l>, and also assume that Q belongs to the 
virtual cut plane II (see Figure 5). Let p and p', and q and q' 
be the image projections of P and Q, respectively. The signed 
distances between the images of the two points are defined as: 

g = pi-qi; g'=p'i-q[- (7) 

It is shown in [1] that: 

9 = Pi-Qi= (^Ti) ( 8) 

where p ~ Hp' (refer to Equation 2). Let A p = pi —p[ and 
A q = qi—q[ be the stereo disparities of P and Q, respectively, 
and define 

A = A p - A q . 
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From Equation 7 follows that g' — g — A, and Equation 8 can 
be written as 
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(g - A). 



(9) 



The deviation in perfect mirroring (g = —g) around the projec- 
tion of the profile cut is function of the differences in pixel dis- 
parities, which is directly related to the depth variation in the 
neighborhood of the 3D profile cut. Note that the virtual cut 
plane II only affects the symmetry in terms of the intersection 
point with the baseline. For similar conditions of relative depth 
variation, any cut plane going through the same point O gener- 
ates symmetries with equivalent quality, regardless of its orien- 
tation. Also note that, for the particular case of planes II inter- 
secting the baseline in the midpoint ((3 = 0.5), the symmetry is 
perfect whenever the surfaces to reconstruct are fronto-parallel 
to the stereo rig (A = 0). 



4.1. Slant prior for enhancing SymStereo 



refinement 1/' 




o 2 o 1 o 



Figure 6: Refinement using slant prior (top view of scene in Figure 5). Assume 
that Q lies on the plane fl. Then, we can determine the position on the baseline 
0 1 (see Equation 12) that improves the induced symmetries. Using the vertical 
virtual cut plane defined by 0 1 and Q, it is possible to induce new symmetries 
from which the refined point Q 1 is estimated. 

Assume that the points P and Q also lie on the same scene 
plane ft ~ (m — Z) that defines a homography M, similar to 
Equation 2, mapping points in the right view into points in the 
left view. Following this, q= Mq' and it can be shown that 



m\b m^b 



I 



-qi 



i 



-Q2 



-m\bq\ - rri2bq2 + lA q 
I 



Since p is also the projection of the same planar surface, by 
applying the homography M comes that A p differs from A q by 



<*i(Pi - Qi)- 



where 



m\b 



(10) 



is proportional to the slant of the plane along the horizontal 
direction. Replacing in Equation 9 comes that 
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(1 - oti)g. 



(11) 



The conclusion that can be drawn is that having prior knowl- 
edge about the position and orientation of the surface to be re- 
constructed, we can determine the point of intersection between 



the virtual plane II and the baseline that grants perfect induced 
symmetry. The image signals are perfectly symmetric when- 
ever g = —g,so that solving with respect to f3 in Equation 1 1 
yields 

2 — a\ 

Following the previous analyze, and in case there is slant in- 
formation available a priori, we suggest a simple approach for 
refining the SymStereo depth estimates. Referring to Figure 6, 
we start by applying a virtual cut plane 11^ intersecting the base- 
line in its midpoint 0\ = 0.56, from which the 3D point Q is 
estimated. Assume that Q lies on the plane ft, whose horizon- 
tal slant defines a particular direction a\ (Equation 10). Us- 
ing Equation 12, we can determine the position on the baseline 
Oj = P x b that a new virtual cut plane should intersect for en- 
hancing the quality of the induced symmetries. This new verti- 
cal virtual cut plane II \ is defined by the points 0 1 and Q, from 
which a refined 3D estimation Q 1 can be computed. Following 
this, the overall quality of the 3D points obtained using Sym- 
Stereo can be iteratively refined by selecting appropriate virtual 
planes intersecting specific points on the baseline. 

5. PPR using SymStereo and PEARL 

This section describes an algorithm that combines Sym- 
Stereo (Section 2.1) with the geometric multi-model fitting al- 
gorithm PEARL (Section 2.2) for semi-dense PPR (see Fig- 
ure 7). The input to the algorithm are M joint energies that 
were computed using loglO from a set of M virtual cut planes 
II that belong to a vertical pencil intersecting the baseline in its 
midpoint. The output are a discrete set of planar surfaces and a 
semi-dense 3D reconstruction, where each reconstructed point 
belongs to a particular plane. The detected planes can then be 
used as plane hypotheses in a global plane labeling strategy for 
computing a dense model (see Section 2.3). 

5.7. Formulation of the global framework 

Consider that the midpoint of the baseline is the center of pro- 
jection of a virtual camera, which will be called the cyclopean 
eye (see Figure 8). The image that is perceived by the cyclopean 
eye has height equal to the number of epipolar planes \I> r with 
r = 1, R (one epipolar plane per image row), and the width 
is given by number of virtual cut planes 11^ with i = 1, M 
(one cut plane for each column). Each pixel of the cyclopean 
eye is originated by the back-projection ray d ir , corresponding 
to the intersection between 11^ and \l/ r . The objective is to esti- 
mate the point on each d i r that most likely belongs to a planar 
surface. This problem is cast as a labeling problem following 
a PEARL framework (Section 2.2). The nodes of the graph are 
the back-projection rays d i r of the cyclopean eye, and to each 
d i r we want to assign a plane label fd- The set of possible la- 
bels is Cq = {Vo, f$}, with /0 meaning that no point on d i r 
belongs to a planar surface. Note that we use d instead of d i r 
whenever the virtual and epipolar plane specifications are not 
strictly necessary. We assume a A/4 neighborhood for d i r that 
is defined by the four back-projection rays di±\^ r and d^ r ±i 
(see Figure 8). 
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Figure 7: Pipeline for PPR using a pair of calibrated images: (1) apply the line cut detection algorithm described in Section 3 along M virtual cut planes for obtaining 
a sparse set of 3D line cuts; then (2) use the global semi-dense PPR algorithm described in Section 5 for computing planar surfaces and obtain a semi-dense PPR; 
use the line cuts estimated in (1) for obtaining plane hypotheses; and (3) use the global pixel-wise plane labeling (Section 2.3) for computing a dense PPR model 
from the plane hypotheses in (2). 
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Figure 8: The scene is sampled by a discrete set of virtual cut planes 11^. This 
can be thought as an image created by a virtual camera that is located between 
the cameras (cyclopean eye), where each epipolar plane \Ev projects onto one 
row and each 11^ projects onto one column of the image. Each pixel of the 
cyclopean eye is originated from the back-projection ray d^ r (red). The A/4 
neighbors of di jV are di±i >r (blue) and d^ r ±i (green). 



5.2. Initial plane hypotheses 

As discussed in Section 3, each line cut is a possible location 
of intersection of a virtual cut plane with a planar surface in 
the scene. In order to propose an initial set of plane models 
Vo for PEARL, we could generate all possible planes that can 
be obtained from two line cuts belonging to different planes II, 
as originally proposed in [16, 28]. However and depending on 
the number of cut planes that are used, the set Vo can easily 
become very large. We noticed that using only pairs of line cuts 
from neighboring cut planes II i± (i 5 2) drastically decreases the 
size of Vo and is enough for initializing our piecewise-planar 
labeling approach. Since it is unlikely that line cuts intersecting 
different epipolar planes correspond to the same planar surface, 
we further reduce Vo and only use pairs of line cuts that have 
a minimum of Ne epipolar lines of overlap (Ne = 10 in this 
article). 



5.3. Data and smoothness term 

The data term for the back-projection ray 

fined as 



is de- 



AU/) = {f n(1 



Ei(r,x f ),T) if/ePo 
if/ = / 0 



where is the joint energy associated with the virtual cut plane 
11^, r is the row corresponding to the epipolar plane \l/ r and r 



is a constant. The coordinate Xf is the column defined by the 
hypothesis /, corresponding to the intersection of d i r with the 
plane indexed by /. Note that similarly to [10], the non-planar 
f$ label indicates that no satisfactory plane hypothesis can be 
assigned to d i r . In this case, the back-projection ray d^ r has 
high probability of not intersecting the scene in a planar surface. 

Inspired by the work of Sinha et al. [9], the smoothness term 
for neighboring nodes d and e is given by 



Vde(/d, fe) 



0 if/ d = /o 

Ai if (d,e,/ d ,/ e ) e Si 

A 2 if (d,e,/ d ,/ e ) e S 2 

A 3 if (d, e) e S 3 

A 4 if(/dV/ d ) = / 0 

1 else 



(13) 



where 0 < A x < A 2 < A3 < 1, and the content of the sets 
Si, S2 and S3 is described next. Remark that no penalization is 
assigned to neighboring nodes receiving the same plane label, 
while in the case of one node obtaining the discard label f$ , a 
non-zero cost A4 is added to the plane configuration f . 




m 1 



(a) Crease Edges 




(b) Detected Line Segments 



Figure 9: We show in (a) some crease edges obtained from intersections of two 
different planes in Vo, while in (b) the result of the clustering of concurrent 
lines is shown. Each group of lines (different groups have different colors) 
provides a possible vanishing point location. The white line segments did not 
received any vanishing point label. 



Following a similar reasoning as [9], plane transitions be- 
tween neighboring nodes d and e are more likely to occur in 
the presence of crease or occlusion edges. A crease edge corre- 
sponds to the projection of the 3D line of intersection between 
two different planes in the scene, while occlusion boundaries 
arise from spatially separated objects in 3D whose image pro- 
jections interfere with each other. 
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Let the point Pd,/ d (Pe,/ e ) ^ e me projection of the intersec- 
tion between the back-projection ray d (e) and the plane as- 
sociated to fd (f e )> In order to encourage plane label transi- 
tions at crease edges, we store in the set Si, the quadruples 
(d, e, /d, f e ) in which the points Pdj d and p e j e are located on 
different sides of the crease edge defined by fd and f e . When- 
ever f contains assignments located in Si, then it incurs a pe- 
nalization Ai (Figure 9(a) shows some crease edges that are 
estimated from real imagery). 

Occlusion edges usually coincide with visible 2D line seg- 
ments in the input views and are often aligned with the vanish- 
ing directions of scene planes (Figure 9(b)). In order to find 
possible occlusion edges, we detect 2D line segments in the 
left view I using the Line Segment Detector [29]. Each line 
segment is a possible location of an occlusion boundary. For 
clustering concurrent lines, we use the global vanishing point 
detection algorithm proposed by Antunes et. al [30]. The set 

52 contains the quadruples (d, e, /d, fe) where the points Pdj d 
and p e j e are located on different sides of a line segment that 
was clustered to a particular vanishing point, whose direction is 
orthogonal either to the planes associated to fd or f e . Finally, 

53 contains the remaining pairs (d, e) whose projections are 
on different sides of a line segment to which no vanishing point 
was assigned. Remark that in contrast to [9], we do not perform 
any line matching between the views, substantially decreasing 
the complexity of the algorithm. 

5.4. Plane refinement 

The third step of PEARL (Section 2.2) is to re-estimate the 
plane model parameters using the inliers of the discrete label- 
ing f . Let ttf be the plane associated to / to which has been 
assigned a non-empty set of inliers D(/) = {d G V\fd = /}. 
Each plane flf is refined by minimizing its plane parameters 
over the energies E via LM [25]: 

fi / = r £ in E (1-Ei(r,z n )), (14) 
f d <>r GD(/) 

where xci is the column defined by the intersection of d ir 
with Q. The new set of labels V\ = |^/| is men use d in a 
new expand step, and we iterate between discrete labeling and 
plane refinement until the a-expansion optimization does not 
decrease the energy of Equation 3 (which usually takes 2 — 3 
iterations). 

5.5. Plane refinement after PEARL 

We have discussed in Section 4.1 that SymStereo can be en- 
hanced in case there is slant information available. The output 
of the global algorithm described previously, is the labeling f 
that assigns to each back-projection ray d a plane ft. The inter- 
section of d with defines a 3D point Q, and ft also defines 
ai that is proportional to the 3D slant in the neighborhood of 
Q. Following this, and as described in Section 4.1, the position 
Q can be refined by iteratively optimizing j3. 

Let £1 be the plane associated to label / to which 
has been assigned a non-empty set of inliers D(/) = 



{d^ r G V\fd i r = /}, and consider that Q^ r is the intersec- 
tion between the ray d ir and (refer to Figure 6). For each 
di r , we compute the corresponding ideal f3\ and obtain a new 
back-projection ray d\ r . The new ray d\ r is located on the 
same epipolar plane, but on the virtual cut plane intersecting 
the point O 1 and the previously reconstructed point Q^ r . Given 
the new plane Q, 1 , a new homography mapping (see Equation 2) 
can be used for inducing improved symmetries, and from which 
the joint energy Ej r is re-calculated. The new joint energies Ej r 
are used in a new refinement step using LM (Section 5.4). We 
iterate between re-computing new back-projection rays d ™ r and 
refining tt n for a pre-defined number of times (4 in this article). 

5.6. Comparison between line cut reconstruction and semi- 
dense PPR 

We show in Figure 10 a brief comparison between the line cut 
reconstruction algorithm described in Section 3 with the semi- 
dense PPR approach described in this section. In the case the 
virtual cut planes intersect planar surfaces with some texture 
and far from object discontinuities, the independent reconstruc- 
tion along single virtual planes provides accurate results (ex- 
ample (a)). In scenarios containing multiple planes and compli- 
cated textures (examples (b-c)), the independent line cut recon- 
struction has some difficulties. These problems are solved us- 
ing our semi-dense PPR pipeline that estimates planar surfaces 
in the scene along different virtual cut planes simultaneously 
and in a global manner. 

6. Experiments in Piecewise-Planar Reconstruction 

We proposed a new and original algorithm for detecting and 
estimating planar surfaces in the scene that combines Sym- 
Stereo and PEARL optimization. For showing the effective 
advantages with respect to the existing PPR approaches, this 
section runs a set of experiments in PPR from a pair of stereo 
images, and compares the performance of the proposed algo- 
rithm with respect to three state-of-the-art approaches. 

The evaluation is carried on a new dataset comprising chal- 
lenging indoor and outdoor scenes (some examples are shown 
in Figures 11-13). The stereo pairs were acquired using a Bum- 
blebee stereo camera from PointGrey, with a baseline of 24 cm 
and image resolution of 1024 x 768 pixels. The scenes contain 
mostly planar surfaces, including a variety of complicated sit- 
uations to traditional stereo methods e.g. low and/or repetitive 
textures, and surface slant. 

6. 1 . Compared Algorithms 

The output of our algorithm (SymS) is a discrete set of plane 
hypotheses V Sy7nS and a semi-dense 3D reconstruction. We 
compare these plane hypotheses with the ones obtained using 
two different approaches. 

The first applies dense stereo (DS) for PPR and was pro- 
posed by Gallup et al. [10]. The authors start by obtaining 
a dense depth map with respect to the left view using local 
stereo. Then, plane hypotheses are generated using a sequen- 
tial RANSAC procedure over the disparity map (refer to [10] 
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(a) 



(b) 



(c) 



Figure 10: Comparison between independent line cut reconstruction and semi-dense PPR. For each example, we show the (independent) detection results along 5 
virtual cut planes (left), and the final labeling results of the semi-dense PPR for 25 cut planes (right). 



for details). Finally, a plane linking step is performed for com- 
bining near planes and/or single planes that are disjoint in the 
image. The output of this algorithm is the set V DS of plane 
hypotheses and a dense PPR. 

The second approach that is compared was proposed by 
Sinha et al. [9], and is based on sparse stereo (SS). It detects and 
computes sparse correspondences, line segments and vanishing 
directions from the images. From these data, plane hypotheses 
are generated from specific histogram votings and RANSAC 
procedures. The output is the set V ss and a sparse PPR com- 
posed by 3D points and 3D line segments. 

6.2. Accuracy analysis and parameter tuning 

The objective is to compare the performance of DS, SymS 
and SS for generating plane hypotheses for the MRF plane la- 
beling described in Section 2.3. Concerning the accuracy anal- 
ysis, it is difficult to obtain the ground truth (GT) model pa- 
rameters in each stereo pair of the dataset, which would involve 
a error prone and time consuming manual selection of point 
matches in the stereo views. We decided to use a different indi- 
cator for measuring the accuracy. 

For each stereo pair, we manually define the planar region 
IZk in the left view I that is associated to a particular plane Qk 
in the scene (see Figure 11). Given the pixel- wise plane label- 
ing f , that was computed using the plane hypotheses generated 
from the algorithms described in Section 6.1, the accuracy of 
the estimation of is evaluated using the following metric: 

E PpC/p) 



where #Hk is the number of pixels in the region. Remark that 
the accuracy analysis using P& must be performed with caution. 
There is no guarantee that < Pi means the plane ttk was 
better estimated than fij. The proposed metric depends largely 
on the textures and illumination of the surfaces e.g. planar sur- 
faces with low-texture and specularities will have a large P& 
even tough the corresponding plane model is well estimated. 
On the contrary, we are in the opinion that the metric P& is ad- 
equate for comparing different estimations of the same plane 

Assume that we use two different algorithms for obtaining 
two different sets of plane hypotheses, say V A1 and V A2 , which 
are used as input to the global plane labeling described in Sec- 
tion 2.3. After the graph-cut optimization, we have the assign- 
ments f A1 from V A1 and f A2 from V A2 for each image pixel. 
Following this, we can compute for each GT plane the 



photo-consistency metrics P^ 1 and P A • In case P^ 1 < P^ 2 , 
then the first algorithm generated a plane hypothesis that better 
fits the input images, which most probably means that ft A1 is 
more accurate than £l A2 . We noticed in practice that this empir- 
ical comparison is a very good accuracy indicator in real- world 
scenarios. 

The parameters that are used in the different algorithms were 
manually tuned using the GT labeling on a subset of stereo pairs 
of the dataset, whose results are not shown in the experimental 
comparison. These values are kept constant for all the remain- 
ing experiments. Concerning our SymS algorithm, we decided 
to use M = 25 virtual cut planes for the best compromise be- 
tween accuracy and runtime. Concerning the MRF labeling (see 
Section 2.3), the parameters are constant and the same for all 
three plane hypotheses generators, namely p m ax =0.8, 7 = 0.6, 
m = l and M = 2. 

6.3. Comparison results 

The dense PPR results obtained using DS, SymS and SS as 
plane hypotheses generators for the pixel-wise plane labeling 
are shown in Figure 1 1 . 

In the first two examples, the scene is composed by two and 
three planes, respectively, which are mostly fronto-parallel to 
the cameras. In these cases, the three algorithms work well and 
provide approximately similar results. SS has some problems 
distinguishing the vertical planes in the example (b), which is 
mainly due to lack of features in the wall on the right. Both ex- 
amples shown in the second row contain, besides other planes, 
a highly slanted surface (blue in example (c) and green in ex- 
ample (d)). Our algorithm is able to detect and accurately re- 
construct this surfaces, whereas DS and SS clearly have diffi- 
culties handling this amount of slant. The examples (e) and (f) 
show scenes containing many planes at different distances from 
the camera. SymS is able to detect all the planes and provides 
the most accurate plane hypotheses, being less sensitive to the 
surface-camera distance when compared to DS and SS. 

The last row shows two examples containing scene with diffi- 
cult textures and illumination conditions. SS is not able to pro- 
vide acceptable plane hypotheses for the MRF labeling so that 
no plane assignment is obtained. DS is still able to cope with 
the complicated texture of example (g), but completely fails in 
the example (h), where the joint effect of high slant and compli- 
cated textures are major challenges for dense stereo matching. 
Our approach recovers all the planes, and can even distinguish 
the close planes of example (g) corresponding to the floor and 
the carpet. 
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(g) (h) 



Figure 11: Comparison between DS, SymS and SS for PPR. For each example we show (top, left) I with GT labeling, different colors correspond to different 
planes; (top, right) mean photo-consistency P in the GT region for each algorithm, each color identifies a particular plane; and (bottom) pixel-wise plane assignment 
obtained using the different algorithms as plane hypotheses generators, different colors identify different planes. The black label refers to the discard label f$. As 
additional accuracy indicator, we manually identified for the example (e) the planes that are mutually orthogonal (e.g. blue and red) and parallel (e.g. green and 
red). We present the mean angles 6± and 0y between the perpendicular and the parallel planes, respectively. 
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Finally and for the sake of completeness, the run-times (with- 
out the final MRF labeling) for each algorithm in the images 
shown in Figure 1 1 are: 1—2 min for SymS (the runtime mostly 
depends on the number of line cuts that are estimated (Sec- 
tion 3)), 2 min for DS, and approximately 3 min for SS. These 
are straight-forward and unoptimized implementations in Mat- 
lab, except for a-expansion optimization, for which the public 
available code of [31, 32, 33, 34] in C++ is used. 

6.4. Two view piecewise models 

As discussed in [35], the depth error in stereo vision is related 
with the correspondence error by a multiplication factor known 
as the geometric resolution that depends on the baseline and on 
the focal length. We assume that the maximum allowed relative 
depth error should be 2%. From our evaluation, this value is 
reached, for the case of our algorithm and in the images shown 
in Figure 1 1, for a depth of around 12 m. This will be our depth 
reconstruction limit, so that we will not reconstruct surfaces 
further away from this bound. 

Figure 12 and Figure 13 show plane labeling and 3D re- 
construction results in indoor and outdoor scenes, respectively. 
This is the type of environments targeted by the PPR algorithms 
described in [14, 8, 9, 10]. While these methods require multi- 
ple views, our approach is able to reach competitive results us- 
ing only a stereo pair. The labeling results are exclusively based 
on photo-consistency and proximity, which explains the poorly 
defined region borders in some examples. Such issue can be 
easily solved using a more sophisticated pixel- wise plane label- 
ing MRF, similar to the one used in Section 5 that incorporates 
crease and occlusion edge information. We chose not to do so in 
order to better assess the accuracy of our plane pose estimation. 



7. Conclusion 

The paper presents an automatic piecewise planar recon- 
struction algorithm from two views. Unlike other existing ap- 
proaches, the stereo depth estimation and the detection of pla- 
nar surfaces are accomplished in a tight and coupled manner 
by combining SymStereo with PEARL [2]. This enables to 
take full advantage of the strong planarity prior, with the al- 
gorithm being able to accurately segment and reconstruct the 
planes contained in the scene. The effectiveness of the scheme 
is proved by comparison with two different state-of-the-art ap- 
proaches in several challenging indoor and outdoor scenarios. 

As a final comment, it can be claimed that the energy-based 
model fitting can either be applied to dense stereo reconstruc- 
tion or to a sparse point-cloud model. The former would sub- 
stantially increase the computational complexity without bring- 
ing obvious benefits, while the latter would avoid the use of the 
smoothness term for regularizing the PEARL energy minimiza- 
tion. Thus, the symmetry-based semi-dense stereo provides a 
trade-off between the two, playing a key role in the success of 
the overall approach. 
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(a) Stereo pair (b) Plane labeling (c) Textured 3D reconstruction 



Figure 12: Indoor results produced by our PPR algorithm. 
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